========================================================
## 'data.frame': 164206 obs. of 20 variables:
## $ cmte_id : chr "C00575795" "C00577130" "C00574624" "C00574624" ...
## $ cand_id : Factor w/ 25 levels "P00003392","P20002671",..: 1 12 10 10 10 12 10 12 12 12 ...
## $ cand_nm : Factor w/ 25 levels "Bush, Jeb","Carson, Benjamin S.",..: 4 20 5 5 5 20 5 20 20 20 ...
## $ contbr_nm : Factor w/ 34113 levels ";HEIMAN, JAMES",..: 28580 17750 14126 14126 14126 17767 14126 17928 18142 18142 ...
## $ contbr_city : Factor w/ 616 levels ""," SUPERIOR",..: 32 79 226 226 226 199 226 233 199 199 ...
## $ contbr_st : Factor w/ 1 level "CO": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : Factor w/ 20491 levels "","00124","00125",..: 815 1386 13591 13591 13591 18622 13591 14231 18642 18642 ...
## $ contbr_employer : Factor w/ 10432 levels "","-","--NA",..: 8093 9630 5408 5408 5408 7145 5408 6548 6548 6548 ...
## $ contbr_occupation: Factor w/ 6000 levels "","'CALL CENTER AGEN",..: 5497 5234 1722 1722 1722 1670 1722 3427 3427 3427 ...
## $ contb_receipt_amt: num 100 50 50 100 100 15 100 15 10 10 ...
## $ contb_receipt_dt : Factor w/ 632 levels "01-APR-15","01-APR-16",..: 2 94 22 267 476 94 600 114 94 114 ...
## $ receipt_desc : Factor w/ 24 levels "","* EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 2 1 1 1 1 1 1 1 1 1 ...
## $ memo_text : Factor w/ 95 levels "","$0.83 REFUNDED ON 10/25/2016",..: 19 14 1 1 1 14 1 14 14 14 ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 2 1 1 1 1 1 1 1 1 1 ...
## $ file_num : int 1091718 1077404 1077664 1077664 1077664 1077404 1077664 1077404 1077404 1077404 ...
## $ tran_id : Factor w/ 163919 levels "A001497D386DA4906B87",..: 48653 129137 92336 93516 94136 129425 94739 129673 129350 129609 ...
## $ election_tp : Factor w/ 4 levels "","G2016","P2016",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ zip : chr "80014" "80020" "80504" "80504" ...
## $ ZCTA_USE : chr "80014" "80020" "80504" "80504" ...
We have 164206 observations of 20 variables.
Let’s add party so we can roll the data up a bit, the republican field was busy in 2016. The idea for party logic here: http://stackoverflow.com/questions/4622060/case-statement-equivalent-in-r
As we don’t have age data, I’d like to be able to compare retirees:non-retirees; adding a variable for retiree status.
## cmte_id cand_id cand_nm contbr_nm
## 1 C00575795 P00003392 Clinton, Hillary Rodham SMITH, KERRI
## 2 C00577130 P60007168 Sanders, Bernard LEONAWICZ, MATTHEW
## 3 C00574624 P60006111 Cruz, Rafael Edward 'Ted' HOWELL, DIRK E. MR.
## 4 C00574624 P60006111 Cruz, Rafael Edward 'Ted' HOWELL, DIRK E. MR.
## 5 C00574624 P60006111 Cruz, Rafael Edward 'Ted' HOWELL, DIRK E. MR.
## 6 C00577130 P60007168 Sanders, Bernard LERNER, LUKAS
## contbr_city contbr_st contbr_zip contbr_employer
## 1 AURORA CO 800144038 SELF-EMPLOYED
## 2 BROOMFIELD CO 800209751 UNIVERSITY OF ALASKA FAIRBANKS
## 3 FIRESTONE CO 805043508 LGS INNOVATIONS
## 4 FIRESTONE CO 805043508 LGS INNOVATIONS
## 5 FIRESTONE CO 805043508 LGS INNOVATIONS
## 6 DURANGO CO 813014721 POWERHOUSE ELECTRIC
## contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc
## 1 TEMPORARY STAFFING 100 01-APR-16
## 2 STATISTICIAN 50 05-MAR-16
## 3 ENGINEER 50 02-APR-16
## 4 ENGINEER 100 14-APR-16
## 5 ENGINEER 100 24-APR-16
## 6 ELECTRICIAN 15 05-MAR-16
## memo_cd memo_text form_tp file_num
## 1 X * HILLARY VICTORY FUND SA18 1091718
## 2 * EARMARKED CONTRIBUTION: SEE BELOW SA17A 1077404
## 3 SA17A 1077664
## 4 SA17A 1077664
## 5 SA17A 1077664
## 6 * EARMARKED CONTRIBUTION: SEE BELOW SA17A 1077404
## tran_id election_tp zip ZCTA_USE cand_party retiree_status
## 1 C4666313 P2016 80014 80014 democrat not_retired
## 2 VPF7BKWYW45 P2016 80020 80020 democrat not_retired
## 3 SA17A.1536976 P2016 80504 80504 republican not_retired
## 4 SA17A.1663942 P2016 80504 80504 republican not_retired
## 5 SA17A.1715938 P2016 80504 80504 republican not_retired
## 6 VPF7BKY1A44 P2016 81301 81301 democrat not_retired
That’s a lot of receipts with a 0 amount. Exclude contributions below zero to zoom in a bit.
Are there zip codes more likely to contribute?
Too specific. Are there cities more likely to contribute? Taking the log10 of donation frequency as Boulder and Denver donated more by orders of magnitude.
Are there occupations that are more likely to contribute?
The largest groups are retirees. I did not expect retirees to be donating so actively. Which candidate received more donations? I’ve heard Colorado is fairly purple.
Let’s check that by party
Histogram is super skewed, trying a pie chart.
## cmte_id cand_id cand_nm
## Length:164206 P00003392:70843 Clinton, Hillary Rodham :70843
## Class :character P60007168:53689 Sanders, Bernard :53689
## Mode :character P80001571:13718 Trump, Donald J. :13718
## P60006111:12953 Cruz, Rafael Edward 'Ted':12953
## P60005915: 7112 Carson, Benjamin S. : 7112
## P60006723: 2201 Rubio, Marco : 2201
## (Other) : 3690 (Other) : 3690
## contbr_nm contbr_city contbr_st
## LENELL, MATT : 390 DENVER :32215 CO:164206
## IMMASCHE, SONIA : 356 BOULDER :14936
## SMITH, PHILIP : 320 COLORADO SPRINGS:11039
## CASPERSON, CAROLINA: 304 FORT COLLINS : 7494
## RAMSEY, WILLIAM : 261 AURORA : 6109
## HOFFMAN, TONI : 240 LITTLETON : 5461
## (Other) :162335 (Other) :86952
## contbr_zip contbr_employer contbr_occupation
## 804212057: 390 N/A :23483 RETIRED :38933
## 805241517: 356 RETIRED :20878 NOT EMPLOYED :16446
## 802212506: 322 SELF-EMPLOYED:12233 INFORMATION REQUESTED: 3861
## 802034502: 304 NONE :12047 ATTORNEY : 3229
## 814015616: 300 NOT EMPLOYED : 6761 TEACHER : 2536
## 80504 : 245 (Other) :88626 (Other) :99169
## (Other) :162289 NA's : 178 NA's : 32
## contb_receipt_amt contb_receipt_dt
## Min. :-16300.0 Min. :2014-08-16
## 1st Qu.: 15.0 1st Qu.:2016-02-28
## Median : 27.0 Median :2016-05-06
## Mean : 102.6 Mean :2016-05-11
## 3rd Qu.: 75.0 3rd Qu.:2016-08-18
## Max. : 18000.0 Max. :2016-11-27
##
## receipt_desc memo_cd
## :162356 :139603
## Refund : 883 X: 24603
## REDESIGNATION FROM PRIMARY : 232
## REDESIGNATION TO GENERAL : 225
## REDESIGNATION TO CRUZ FOR SENATE: 144
## SEE REDESIGNATION : 116
## (Other) : 250
## memo_text form_tp
## :98139 SA17A:139760
## * EARMARKED CONTRIBUTION: SEE BELOW:52328 SA18 : 23563
## * HILLARY VICTORY FUND :12312 SB28A: 883
## REDESIGNATION FROM PRIMARY : 232
## REDESIGNATION TO GENERAL : 225
## EARMARKED FROM MAKE DC LISTEN : 163
## (Other) : 807
## file_num tran_id election_tp zip
## Min. :1003942 C10182166: 2 : 400 Length:164206
## 1st Qu.:1077648 C10233294: 2 G2016: 52017 Class :character
## Median :1093618 C10233584: 2 P2016:111788 Mode :character
## Mean :1095013 C10248517: 2 P2020: 1
## 3rd Qu.:1119042 C10258303: 2
## Max. :1134173 C10262748: 2
## (Other) :164194
## ZCTA_USE cand_party retiree_status
## Length:164206 democrat :124675 Length:164206
## Class :character republican : 38751 Class :character
## Mode :character libertarian: 502 Mode :character
## green : 219
## independent: 59
##
##
Some of these people were contributing early and often. I’ll compare those to the 2015-2016 contribution limits here later: http://www.fec.gov/info/contriblimitschart1516.pdf
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -16300.0 15.0 27.0 102.6 75.0 18000.0
The histogram on contributions had a large group around zero – are there a large number of zero contributions?
## [1] 5
Not a lot of zero contributions, interesting. How long did the donation season extend?
There was some news speculation that large donors were waiting for the primaries to thin out.
## Min. 1st Qu. Median Mean 3rd Qu.
## "2014-08-16" "2016-02-28" "2016-05-06" "2016-05-11" "2016-08-18"
## Max.
## "2016-11-27"
What about the memos on the donations? That may have some interesting information.
## # A tibble: 95 × 2
## memo_text n
## <fctr> <int>
## 1 98139
## 2 * EARMARKED CONTRIBUTION: SEE BELOW 52328
## 3 * HILLARY VICTORY FUND 12312
## 4 REDESIGNATION FROM PRIMARY 232
## 5 REDESIGNATION TO GENERAL 225
## 6 EARMARKED FROM MAKE DC LISTEN 163
## 7 REDESIGNATION TO CRUZ FOR SENATE 144
## 8 *BEST EFFORTS UPDATE 135
## 9 SEE REDESIGNATION 116
## 10 REATTRIBUTION / REDESIGNATION REQUESTED 69
## # ... with 85 more rows
normalized, there’s one entry per donation (or refund)
We could roll up to see how much each candidate received as a whole, or how much specific earmarks received, and drill down to see if someone in the state went over a contribution limit.
Each candidate is ID’d, and eafch contributor is specifically named, along with the date of hteir contribution and the amount of their contribution. Any memos with specific earmarks or refund memos are designated clearly.
Yes, I created a variable for party affiliation and a variable for ZCTA codes.
I transformed the date into date format to make summary analysis simpler on that variable. There were a lot of donations that seemed centered around zero. I did exclude the top 1% while looking at these so I could see the data a bit closer, there are only 5 actual zero contributions. The others in the near-zero range appear to be micro-transactions and not actually zero.
Did the giving patterns (many small donations, few large donations) vary across candidates? Using log10 because these are financial amounts, and diff candidates (naturally) received very diff amounts in this state. I’m not interested in how much total they received, but what that distro looks like.
Roll that up by party, keep log scale as total coming into various parties has lg difference.
Excluding refunds, let’s look at donations by party (also excluding top 1%) Do specific parties get more high-dollar donations?
Did larger/smaller donations come in at different points in the election cycle? Using ylim to exclude some outliers
Which party earmarked more funds?
I’ve heard there’s a strong rural/urban funding divide – visible in this data?
This seems to reflect the histogram of cities showing that most of the money came from urban.
There were several misc refunds in the memo data in uni-variate
## # A tibble: 35 × 3
## memo_text n
## <fctr> <int>
## 1 * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING 23
## 2 REFUNDED ON 10/24/2016 9
## 3 REFUND TO BE ISSUED 6
## 4 * EARMARKED CONTRIBUTION: SEE BELOW REFUNDED ON 10/24/2016 3
## 5 REFUNDED ON 10/10/2016 3
## 6 REATTRIBUTION/REFUND PENDING 2
## 7 REFUNDED $1000.00 ON 12/29/2015 2
## 8 REFUNDED ON 10/18/2016 2
## 9 REFUNDED ON 7/12/2016 2
## 10 $0.83 REFUNDED ON 10/25/2016 1
## # ... with 25 more rows, and 1 more variables: meanrefund <dbl>
How different were the average amounts for each candidate? The story in the news was that the democratic candidates were receiving more money through more frequent, smaller donations.
## # A tibble: 25 × 6
## cand_nm total_donation mean_donation median_donation
## <fctr> <dbl> <dbl> <dbl>
## 1 Clinton, Hillary Rodham 8544967.4 120.61837 25
## 2 Sanders, Bernard 2298527.4 42.81189 27
## 3 Trump, Donald J. 2456085.8 179.04110 60
## 4 Cruz, Rafael Edward 'Ted' 1011921.4 78.12255 50
## 5 Carson, Benjamin S. 694383.1 97.63541 50
## 6 Rubio, Marco 632342.2 287.29768 100
## 7 Fiorina, Carly 223509.6 236.26813 100
## 8 Paul, Rand 105243.5 160.67716 50
## 9 Johnson, Gary 128571.1 256.11771 100
## 10 Bush, Jeb 276847.0 692.11750 250
## # ... with 15 more rows, and 2 more variables: stdev_donation <dbl>,
## # n <int>
Data in this reflects the news story! How different were average amounts by party? Since Dean’s presidential campaign, dems have supposedly had an edge on smaller/frequent donations.
## # A tibble: 5 × 5
## cand_party total_donation mean_donation median_donation n
## <fctr> <dbl> <dbl> <dbl> <int>
## 1 democrat 10930322.95 87.67053 25 124675
## 2 republican 5762403.22 148.70334 50 38751
## 3 libertarian 128571.09 256.11771 100 502
## 4 green 21388.27 97.66333 50 219
## 5 independent 12156.00 206.03390 100 59
Donations by city? do rural areas contribute less to political donations due to high-paying jobs often being concentrated in urban areas?
## # A tibble: 616 × 5
## contbr_city total_donation mean_donation median_donation n
## <fctr> <dbl> <dbl> <dbl> <int>
## 1 DENVER 4190800.7 130.08849 27 32215
## 2 BOULDER 1752512.3 117.33478 27 14936
## 3 COLORADO SPRINGS 1018770.6 92.28830 30 11039
## 4 FORT COLLINS 520008.9 69.39003 25 7494
## 5 ENGLEWOOD 461519.4 192.54043 35 2397
## 6 LITTLETON 454958.3 83.31044 27 5461
## 7 ASPEN 432404.5 428.54759 40 1009
## 8 AURORA 376474.7 61.62624 25 6109
## 9 GREENWOOD VILLAGE 333703.3 324.93023 75 1027
## 10 CENTENNIAL 330993.5 100.94341 28 3279
## # ... with 606 more rows
First, I saw a story that reported researchers using national donation data to find people who donated more than the contribution limit for the 2015-2016 federal elections. Let’s see if that’s visible in this data.
## Source: local data frame [671 x 4]
## Groups: contbr_nm [668]
##
## contbr_nm cand_nm total_donation n
## <fctr> <fctr> <dbl> <int>
## 1 FARKAS, JOEL Rubio, Marco 18900 7
## 2 JABS, JACOB MR. Rubio, Marco 13500 5
## 3 BARRON, CURRIE Clinton, Hillary Rodham 10800 4
## 4 DOUD, BEN MR. Rubio, Marco 10800 4
## 5 FARKAS, DANA MRS. Rubio, Marco 10800 4
## 6 SREDNICKI, RICHARD Trump, Donald J. 9900 4
## 7 BARRON, THOMAS A. MR. Graham, Lindsey O. 8100 6
## 8 CONGDON, NOEL R. Clinton, Hillary Rodham 8100 3
## 9 CONGDON, THOMAS Clinton, Hillary Rodham 8100 5
## 10 COORS, JEFFREY Cruz, Rafael Edward 'Ted' 8100 5
## # ... with 661 more rows
There’s a lot of donations above the individual to candidate max – hopefully these are PAC donations that were filed as ‘for’ a specific candidate.
News stories frequently reported an age difference between party donation. Is that reflected here?
## # A tibble: 10 × 6
## retiree_status cand_party total_donation mean_donation median_donation
## <chr> <fctr> <dbl> <dbl> <dbl>
## 1 not_retired democrat 8659734.69 88.92450 27.0
## 2 not_retired republican 3794510.73 171.51106 50.0
## 3 not_retired libertarian 117956.59 262.12576 100.0
## 4 not_retired green 15388.27 93.26224 50.0
## 5 not_retired independent 8288.50 184.18889 100.0
## 6 retired democrat 2270588.26 83.19611 25.0
## 7 retired republican 1967892.49 118.35523 50.0
## 8 retired libertarian 10614.50 204.12500 100.0
## 9 retired green 6000.00 111.11111 50.0
## 10 retired independent 3867.50 276.25000 37.5
## # ... with 1 more variables: n <int>
There’s often an age split discussed in political news. Are retirees in Colorado more likely to be democrat or republican?
## [1] 0.2
Phi is close to zero, there is little or no association between retiree status and party in the Colorado donor data.
Were earmarked funds more likely to be higher amounts?
## # A tibble: 2 × 5
## `if_else(grepl("EARMARK", memo_text), ...` total_donation mean_donation
## <chr> <dbl> <dbl>
## 1 Earkmarked 2262311 43.06129
## 2 Not Earmarked 14592530 130.67665
## # ... with 2 more variables: median_donation <dbl>, n <int>
Were Republicans/Democrats more likely to earmark funds?
## [1] 0.38
Phi is 0.38, a bit further from zero, weak positive association between party and earmarking funds in the Colorado donor data.
I found it really interesting that the Democratic party did fundraise successfully using smaller, more frequent, donations than the Republican party. I had read articles and heard news stories alleging this was their tactic, and it’s interesting to see how that strategy was successful in Colorado.
I was previously unaware of how many refunds existed in election fundraising.
It was interesting, but not incredibly surprising, to see the total funding by region mirror population density.
A weak positive association between earmarks and democrats. Democrats were more likely to use earmarks. Combining this with the summarised earmark data, these were also for lower amounts than the average donation.
Did some people contribute more frequently, and were their contribution totals higher than those who contributed less often?
How do financial amounts vary for retirees vs non-retirees with political party?
Were different parties getting more for retirees vs non-retirees?
Was there a strong divide for rural/urban on party line funding?
That wasn’t very visually helpful–let’s look at the spread between the two specifically
Were larger funds earmarked?
Interesting, it looks like smaller funds were more often earmarked.
Did the retiree/non-retiree donation split hard on party line on money, not amount?
## [1] 0.15
Phi is 0.15, little/no association between retiree status and total donation in the Colorado donor data. Is there a relationship between the mean?
## [1] -0.07
Phi is -0.07, that’s even closer to zero, so even a lower association between the mean amount donated and retiree status.
Is there a relationship between donation frequency & amount?
##
## Calls:
## m1: lm(formula = I(total_donation) ~ I(number_of_donations), data = frequent_contributors)
## m2: lm(formula = I(total_donation) ~ I(number_of_donations) + cand_party,
## data = frequent_contributors)
##
## ==================================================================
## m1 m2
## ------------------------------------------------------------------
## (Intercept) 393.599*** 492.426***
## (5.170) (7.283)
## I(number_of_donations) 17.367*** 14.862***
## (0.513) (0.527)
## cand_party: republican/democrat -183.081***
## (9.498)
## cand_party: libertarian/democrat -33.675
## (53.518)
## cand_party: green/democrat -273.951**
## (94.900)
## cand_party: independent/democrat -116.455
## (157.603)
## ------------------------------------------------------------------
## R-squared 0.0 0.0
## adj. R-squared 0.0 0.0
## sigma 866.9 862.4
## F 1145.8 306.9
## p 0.0 0.0
## Log-likelihood -291157.0 -290969.7
## Deviance 26736283722.0 26456217911.4
## AIC 582320.0 581953.4
## BIC 582345.5 582012.7
## N 35577 35577
## ==================================================================
That is an awful model. I’m not seeing a predictible relationship between the variables here, I suspect there’s hidden motivators not in this data set.
Donations mostly came from urban areas. Urban areas were also more likely to be donating Democratic. Combined, the urban areas were donating significantly more to the Democratic party.
It was also interesting that retirees were donating higher amounts to the Republican party. As a whole, retirees donated almost purple–the sum of democrat donations vs the sum of republican donations is very similar as a part of the total donations in Colorado.
I created a model and was completely unable to predict how much someone would donate based on the interaction between the frequency of donations and the candidate party.
The majority of moneys donated in the 2016 election from Colorado residents was for Democratic candidates; 10.93 million Democratic to 5.76 million Republican.
The Democrat/Republican donation split for retirees in Colorado is almost purple, barely leaning Democratic. However, the non-retiree population’s donations are financially heavily Democratic. This fits in nicely with the local wisdom that the urban areas vote Democratic, while the prarie votes Republican.
I overlaid the map to show how high population areas and ‘resort’ areas donated significantly more funds to Democratic candidates than Republican candidates in the 2016 election cycle. The N/A areas are areas with very low population. I’m using ZCTA instead of true Zip code because choropleth runs off ZCTA, as does census data. In this graph, the bluest areas are the areas with the highest funding difference in favor of the Democratic party.
I validated several stories that had been trending in the news during the election cycle in this data set.
I often heard Colorado referred to as ‘purple’, leaning between Democratic and Republican, so I checked to see if the financial donations reflected that even split. This was not the case in the overall donation sums between the two parties.
In detail, I verified the news story that the Democratic funding strategy successfully raised more money by raising more frequent, smaller, donations while the Republican party went for larger, less frequent donations. On average, the Colorado donations were 50 dollars for Democrats and 100 dollars for republicans. In the violin chart, there’s a large visual difference between the large amount donations in the two parties.
I suspected the difference between the financials could be due to the differnce between older and younger contributors, as the millenial vote was strongly Democratic this election season. That was validated in the data–the split between retirees was almost even, while the split between non-retirees was rather high–8659734.69 dollars to 3794510.73 dollars.
Lastly, I’ve heard that the red/blue split is heavily influenced by geography, with Republican voters living in the prarie and Democratic voters living in the mountains and highly populated urban areas. This was validated in the data as seen in the difference between blue/Democratic fundraising and white/Republican fundraising in the choropleth map. In future analysis, I’d like to combine this data with some reliable census data and look at donations vs the average family income and population. With this, I could look at statistics for how the more heavily populated or higher income areas leaned toward various parties.
Technically speaking, I found the greatest challenge in the choropleth maps. I was highly interested in seeing how donations were spread geographically, and it took some time to find an appropriate library and documentation to do that in R. I had to download a crosswalk for zip codes to ZTCA codes, as the r library relies on ZTCA codes and the data included zip codes.
The greatest success for benefit from time spent was adding political party – I used wikipedia to manually build out a crosswalk for those, and it was very useful in graphing and grouping the data.